Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function
نویسندگان
چکیده
The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) – a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.
منابع مشابه
Outlier detection in financial statements: a text mining method
This paper presents a text mining methodology to extract outlying knowledge from a collection of financial statements. The main idea is to extract relevant financial performance indicators and discover implicit textual description of the indicators. The extracted information was represented using a network language i.e. conceptual graph. Outlier mining was performed on the conceptual graph repr...
متن کاملEFFICIENT DESIGN OF EMBEDDED SIGNAL PROCESSING SYSTEMS USING TOPOLOGICAL PATTERNS BASED DATAFLOW GRAPH REPRESENTATIONS by
Tools for designing signal processing systems with their semantic foundation in dataflow modeling often use high-level graphical user interfaces (GUIs) or text based languages that allow specifying applications as directed graphs. Such graphical representations serve as an initial reference point for further analysis and optimizations that lead to platform-specific implementations. For large-sc...
متن کاملDetecting Deviations in Text Collections: An Approach Using Conceptual Graphs
Abstract. Deviation detection is an important problem of both data and text mining. In this paper we consider the detection of deviations in a set of texts represented as conceptual graphs. In contrast with statistical and distance-based approaches, the method we propose is based on the concept of generalization and regularity. Among its main characteristics are the detection of rare patterns (...
متن کاملImplementing Knowledge Interchange for Simulated Entities
This paper describes the techniques, which are being developed and used by Bevilacqua Research Corporation (BRC), to address the cooperative development and reuse of knowledge for intelligent simulations. The work is based on the use of the draft proposed American National Standards (dpANS) Conceptual Graph standard that defines a Conceptual Graph Interchange Format (CGIF). This standard, while...
متن کاملData Models for Conceptual Structures
A well-founded data model for Conceptual Structures can help in understanding issues of definitional semantics, efficient implementations and even syntax of proposed languages. This paper presents several useful data models of increasing complexity and applicability that can support Conceptual Structures definitional semantics. The models are presented in Haskell, a non-strict, stronglytyped fu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intell. Data Anal.
دوره 16 شماره
صفحات -
تاریخ انتشار 2012